Replicability Crisis in Science?
Branden Fitelson
Filippo Gambarota
Giovanni Parmigiani
8-10 July 2024
A tree of possibilities…
… Where only some of them produce a certain result.
Not all scenarios are equally plausible or reasonable. Thus the multiverse is not the entire set but a non-random sample of plausibile choices.
Despite useful and very powerful, meta-analysis is characterized by several (arbitrary) choices. For example:
With a pre-post Cohen’s \(d\) we need the pre-post correlation \(\rho\) to calculate the sampling variance:
\[ \sigma^2_{\epsilon_{pp}} = \frac{2(1 - \rho)}{n} + \frac{d^2}{2n} \]
\(\rho\) is usually non reported and need to be chosen from previous literature or a plausible guess.
v_dm <- function(r, d, n){
2*(1-r)/n + d^2/(2*n)
}
vdd <- data.frame(x = seq(-1, 1, 0.001))
vdd$y <- v_dm(vdd$x, 0, 40)
ggplot(vdd, aes(x = x, y = y)) +
geom_line() +
xlab(expression(rho)) +
ylab(expression(sigma^2)) +
theme_minimal(25)p_beta <- ggplot(pimaex, aes(y = b)) +
xlim(c(-0.5,0.5)) +
geom_boxplot(width = 0.5,
fill = "dodgerblue",
alpha = 0.6) +
ylab(latex2exp::TeX("$\\mu_{\\,\\, \\theta}$")) +
theme_minimal(30) +
theme(axis.text.x = element_blank())
p_pval <- ggplot(pimaex, aes(y = -log10(p.value))) +
xlim(c(-0.5,0.5)) +
geom_hline(yintercept = -log10(0.05)) +
geom_boxplot(width = 0.5,
fill = "dodgerblue",
alpha = 0.6) +
ylab(latex2exp::TeX("Raw $-log_{10}(p)$")) +
theme_minimal(30) +
theme(axis.text.x = element_blank())
cowplot::plot_grid(p_beta, p_pval, nrow = 1)Discrepancies in results could be due to:
Hall et al. (2022) and Liu et al. (2021) proposed several tools to describe the results of a multiverse analysis.
The increase in complexity NEED to be managed using appropriate tools to summarise and visualize the results.
specif_res |>
ggplot(aes(x = b)) +
geom_histogram(bins = 50,
fill = "dodgerblue",
col = "black") +
ylab("Frequency") +
xlab("Estimated Effect") +
geom_vline(xintercept = 0, col = "red") +
geom_vline(xintercept = 0.25, col = "green")above <- specif_res |>
arrange(b) |>
mutate(id = 1:n()) |>
ggplot(aes(x = id, y = b)) +
geom_segment(aes(xend = id, y = ci.lb, yend = ci.ub)) +
geom_hline(yintercept = 0, col = "red") +
geom_hline(yintercept = 0.25, col = "green") +
theme(axis.title.y = element_blank(),
axis.text.x = element_blank(),
axis.title.x = element_blank())
below <- specif_res |>
arrange(b) |>
mutate(id = 1:n()) |>
pivot_longer(c(
target_group,
format,
diagnosis,
risk_of_bias,
type,
method
)) |>
ggplot(aes(x = id, y = value)) +
geom_raster() +
theme(axis.title.y = element_blank(),
axis.text.x = element_blank()) +
xlab("Specification")
plot_grid(above, below, nrow = 2, align = "hv")We can the effect of a set of scenarios having a certain parameter:
specif_res |>
arrange(b) |>
mutate(id = 1:n()) |>
ggplot(aes(x = b, fill = diagnosis)) +
geom_density(alpha = 0.5)The idea of the specification curve is both descriptive (see previous plots) and inferential.
Inferentially, the approach requires to generate a new dataset under the null hypothesis and compare with the observed value.
Plessen et al. (2023) describe how to implement the method for meta-analysis.
specif_res <- specif_res |>
arrange(b) |>
mutate(id = 1:n())
ggplot(data = specif$SD,
aes(x = id, y = b)) +
geom_line(aes(group = spec)) +
geom_line(data = specif_res,
aes(x = id, y = b),
col = "firebrick",
lwd = 2)P-selection Inference in Multiverse Meta-analysis (PIMA) is an inferential approach to multiverse analysis is an inferential framework for:
We implemented PIMA also on meta-analysis but (for the moment) limited to two-level models without moderators.
tp <- function(p){
-log10(p)
}
lbl <- c("Never $p \\leq 0.05$",
"Before correction $p \\leq 0.05$",
"After correction $p \\leq 0.05$")
pimaex$sign <- factor(pimaex$sign,
labels = latex2exp::TeX(lbl))
ggplot(pimaex, aes(x = tp(p.value), tp(adjust.maxt), color = sign)) +
geom_hline(yintercept = tp(0.05), alpha = 0.4) +
geom_vline(xintercept = tp(0.05), alpha = 0.4) +
geom_abline(linetype = "dashed", lwd = 0.5) +
geom_point(size = 5,
position = position_jitter(width = 0.1),
alpha = 0.5) +
xlim(c(0, 3)) +
ylim(c(0, 3)) +
xlab(TeX("Raw p value ($-log_{10}$)")) +
ylab(TeX("Corrected p value")) +
theme_minimal(base_size = 20) +
theme(legend.title = element_blank(),
legend.position = "bottom") +
scale_color_manual(labels = scales::parse_format(), values = c("#F8766D", "#7CAE00", "#00BFC4"))multi_cit <- data.frame(
year = c(2024L,2023L,2022L,2021L,2020L,2019L,
2018L,2017L,2016L,2015L,2014L,2013L),
papers = c(30L, 46L, 33L, 19L, 10L, 11L, 4L, 2L, 3L, 0L, 1L, 1L),
total = c(652674L,1295092L,1317981L,1299609L,
1161910L,1044346L,955091L,907118L,874112L,840256L,804004L,
777242L)
)
multi_cit |>
mutate(year = as.integer(year)) |>
ggplot(aes(x = year, y = papers/total)) +
geom_line() +
geom_label(aes(label = papers)) +
scale_x_continuous(breaks = seq(2012, 2024, 1)) +
theme(axis.title.x = element_blank(),
axis.text.y = element_blank(),
axis.ticks.y = element_blank()) +
ylab("# papers on Multiverse")